Urban Population Data Analysis

This notebook analyzes global urbanization trends, life expectancy, GDP per capita, and their relationships using data visualizations.

Author

Piyush Rajendra Jain

Published

April 27, 2025

Explaining the Graphs: Urbanization, Wealth, and Life

Let’s take a journey through urban development, global wealth, and human well-being — all using data visualization!

Understanding the Modern World: Urbanization, Wealth, and Human Well-being

The past decades have witnessed tremendous shifts in where and how humans live. Urbanization — the movement of humans from rural to urban areas — has fundamentally altered countries economically, socially, and even environmentally.

But urbanization is more than cities growing taller. It is intimately linked with economic prosperity and human well-being. - Do richer countries urbanize more? - Does city living really lead to a longer, healthier life?

In this project, we explore these questions by looking at global data on:

  1. Urban population percentages,
  2. GDP per capita, and
  3. Life expectancy

In a series of visualizations, we show how these powerful forces become intertwined to shape our world today.

Show the code
# Importing all the required packages and libraries to efficiently complete the project
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import geopandas as gpd

from plotnine import *
import ipywidgets as widgets
from IPython.display import display
from shapely.geometry import Polygon

# Reading the csv uploaded to colab
urban_data = pd.read_csv('cleaned_unicef_data.csv')

Graph 1: Choropleth Map of Urban Population Share

Here is a world map tinted by the percentage of each country’s population living in cities.

  • Dark countries are highly urbanized (lots of people live in cities).
  • Lighter ones, not so much.

This shows us from up high: Where are people aggregating in cities? Which places are still quite rural?

But urbanization is not a lone activity. It touches life — sometimes subtly, sometimes fundamentally. So naturally we ask:

Do urban living really improve your quality of life?

To explore that……

Show the code
!wget -q https://github.com/nvkelso/natural-earth-vector/raw/master/geojson/ne_110m_admin_0_countries.geojson -O countries.geojson

map_data = gpd.read_file("countries.geojson")
map_data.rename(columns={'ADMIN': 'country'}, inplace=True)

merged = pd.merge(map_data, urban_df[['country', 'share_of_urban_population']], on='country', how='left')


map_df = merged.explode(index_parts=False).reset_index(drop=True)
map_df = map_df[map_df['geometry'].notnull() & map_df['geometry'].apply(lambda g: g.geom_type == 'Polygon')]

def extract_coords(row):
    x, y = row.geometry.exterior.coords.xy
    return pd.DataFrame({
        'x': x,
        'y': y,
        'country': row['country'],
        'share_of_urban_population': row['share_of_urban_population'],
        'group': row.name
    })

plot_df = pd.concat([extract_coords(row) for _, row in map_df.iterrows()], ignore_index=True)

world_map = (
    ggplot(plot_df, aes(x='x', y='y', group='group', fill='share_of_urban_population')) +
    geom_polygon(color='black', size=0.1) +
    scale_fill_gradient(
        low="#f0f9e8", high="#084081",
        name="Urban Population Share (%)",
        na_value="lightgrey"
    ) +
    coord_equal() +
    theme_void() +
    labs(
        title="Urban Population Share by Country",
        subtitle="Countries with missing data shown in light grey"
    ) +
    theme(
        figure_size=(15, 8),
        plot_title=element_text(size=16, weight='bold'),
        plot_subtitle=element_text(size=12, color='gray'),
        legend_title=element_text(size=10),
        legend_position='right'
    ))

display(world_map)

Graph 2: Top 10 Countries by Life Expectancy (2020)

We shift our focus to life expectancy — an important measure of well-being.

This bar chart shows the 10 countries with the highest life expectancy in 2020. We can start spotting a pattern: many of these countries are also highly urbanized, suggesting a potential link.

But just looking at the top countries isn’t enough. We need to see the relationship across the whole world.

Show the code
# Filter for 2020
data_2020 = urban_df[urban_df['year'] == 2020].dropna(subset=['life_expectancy_at_birth,_total_(years)'])

# Group by country and calculate mean life expectancy
avg_lifeexp = (
    data_2020.groupby("country")["life_expectancy_at_birth,_total_(years)"]
    .mean()
    .reset_index(name="avg_life_expectancy")
)

top10 = avg_lifeexp.nlargest(10, "avg_life_expectancy")

# Plot
plt.figure(figsize=(10, 6))
sns.barplot(data=top10, y="country", x="avg_life_expectancy", palette="viridis")
plt.title("Top 10 Countries by Life Expectancy (2020)")
plt.xlabel("Average Life Expectancy (Years)")
plt.ylabel("Country")
plt.tight_layout()
plt.show()

Graph 3: Scatter Plot - Urban Population vs Life Expectancy

Here, every point represents a country. We plot urbanization on the x-axis and life expectancy on the y-axis.

The trend line shows us: - More urbanized countries tend to have higher life expectancies!

It’s not a perfect correlation, but the positive relationship is clear.

At this point, we realize wealth might be a hidden player in this story. So we ask:

Maybe richer countries are both more urbanized and healthier?”

To dig deeper…

Show the code
scatter_plot = (
    ggplot(urban_df, aes(x='share_of_urban_population', y='life_expectancy_at_birth,_total_(years)')) +
    geom_point(
        color="#0072B2",
        size=1,
        alpha=0.6
    ) +
    geom_smooth(
        method='lm',
        color="#D55E00",
        linetype="solid",
        size=1.5,
        se=True,
        fill="#FFDAB9"
    ) +
    labs(
        title='Urbanization and Life Expectancy',
        subtitle='Countries grouped by share of urban population',
        x='Share of Urban Population (%)',
        y='Life Expectancy at Birth (Years)',
        caption='Source: UNICEF'
    ) +
    theme_minimal(base_size=15) +
    theme(
        plot_title=element_text(weight='bold', size=20, ha='center', margin={'b': 10}),
        plot_subtitle=element_text(size=14, ha='center', margin={'b': 15}),
        axis_title_x=element_text(size=14, margin={'t': 10}),
        axis_title_y=element_text(size=14, margin={'r': 10}),
        axis_text=element_text(size=12),
        panel_grid_major=element_line(color='#e0e0e0'),
        panel_grid_minor=element_blank(),
        figure_size=(10, 10)
    )
)

display(scatter_plot)

Graph 4: Correlation Heatmap

This colorful heatmap shows correlations between different variables:

  • Urbanization

  • GDP per capita

  • Life expectancy

Strong positive or negative colors highlight where tight relationships exist. This gives us a bird’s-eye view of how all these factors interact!

Finally, we tie everything back to our original hypothesis…

CONCLUSION

Key Takeaways:

Our visual journey reveals several important insights:

  • Urbanization is unevenly distributed across the globe, but it’s steadily rising everywhere.

  • Countries with higher urban populations tend to enjoy longer life expectancies, suggesting better access to healthcare, education, and services.

  • Economic prosperity (GDP per capita) is strongly linked to both urbanization and life expectancy.

Wealthier, more urbanized nations generally offer better living standards for their populations.

Together, the data paints a clear picture: Urbanization, when combined with economic development, can be a powerful driver of better human outcomes.

However, the benefits are not automatic — thoughtful planning, inclusivity, and sustainability are crucial to ensuring that urban growth leads to a healthier and more equitable world for all.